Transcriptomic data exploration of consensus genes and molecular mechanisms between chronic obstructive pulmonary disease and lung adenocarcinoma

Most current research has focused on chronic obstructive pulmonary disease (COPD) and lung adenocarcinoma (LUAD) alone; however, it is important to understand the complex mechanism of COPD progression to LUAD. This study is the first to explore the unique and jointly molecular mechanisms in the pathogenesis of COPD and LUAD across several datasets based on a variety of analysis methods. We used weighted correlation network analysis to search hub genes in two datasets from public databases: GSE10072 and GSE76925. We explored the unique and jointly molecular mechanistic signatures of the two diseases in pathogenesis through enrichment analysis, immune infiltration analysis, and therapeutic targets analysis. Finally, the results were confirmed using real-time quantitative reverse transcription PCR. Fifteen hub genes were identified: GPI, EZH2, EFNA4, CFB, ENO1, SH3PXD2B, SELL, CORIN, MAD2L1, CENPF, TOP2A, ASPM, IGFBP2, CDKN2A, and ELF3. For the first time, SELL, CORIN, GPI, and EFNA4 were found to play a role in the etiology of COPD and LUAD. The LUAD genes identified were primarily involved in the cell cycle and DNA replication processes; COPD genes we found were related to ubiquitin-mediated proteolysis, ribosome, and T/B-cell receptor signaling pathways. The tumor microenvironment of LUAD pathogenesis was influenced by CD4 + T cells, type 1 regulatory T cells, and T helper 1 cells. T follicular helper cells, natural killer T cells, and B cells all impact the immunological inflammation in COPD. The results of drug targets analysis suggest that cisplatin and tretinoin, as well as bortezomib and metformin may be potential targeted therapy for patients with COPD combined LUAD. These signatures may be provided a new direction for developing early interventions and treatments to improve the prognosis of COPD and LUAD.


Materials and methods
Data collection and processing. We used "COPD," "LUAD," and "Homo sapiens" as keywords to retrieve the transcriptome spectrum of the COPD and LUAD datasets from the Gene Expression Omnibus. We found two expression datasets with readily available data that matched our search criteria, GSE10072 and GSE76925. Gene expression profiles of 58 LUAD patients and 16 healthy subjects (58 lung tumor tissue and 49 normal lung tissue) were found in GSE10072 (Supplementary Table S1). The GSE76925 dataset consisted of gene expression profiles from 111 COPD patients and 40 healthy individuals (Supplementary Table S2). The other detailed messages of the two datasets may be found in Supplementary Table S3. Based on the Robust Multichip Average method of the single-channel Affymetrix chip, we used the Bioconductor Affy package to process and normalize the GSE10072 gene expression data 10 . The data in GSE76925 were processed using the R package to apply a log2 transformation to the original matrix and implement background correction and quantile normalization 11 . We downloaded the two gene expression datasets that had been processed. Subsequently, we matched the probe number with the Gene symbol according to the illuminaHumanv4. DB R package. The probe ID with the highest average expression value was selected when multiple probes were found corresponding to one ID. Then, the Limma package in R was used to identify the differentially expressed genes between COPD and LUAD. Two basic criteria based on the P values and log2FC values of the genes were used to identify differentially expressed genes (DEGs). The corrected P value (adj.P.Value) was obtained. The adj.P.Val < 0.05 and |log2FC|> = 1 were selected as the threshold for DEG screening criteria. Finally, the expression matrices of GSE10072 and GSE76925 were produced. Figure 1 shows a flowchart of all steps involved in our analysis.

Identification of DEGs and candidate genes. The DEGs between COPD and LUAD were identified
using the linear regression model software package in R, Limma. The t test method was used to identify the differentially expressed genes. The P value and log2FC value of the genes, as mentioned earlier, were calibrated through multiple experiments, and then the adjustment P value (adj. P. val) was obtained. The adj. P. Val and log2FC used to screen for DEGs. These DEGs were used as a verification set. In the same way, the screening condition adj. P. Val < 0.05 was chosen to find candidate genes in the two datasets. We focused on analyses of these genes. www.nature.com/scientificreports/ Weighted gene co-expression network analysis (WGCNA). Weighted gene coexpression network analysis (WGCNA) is a method of categorizing genes into different modules according to certain conditions to identify biologically relevant information. In COPD-special and LUAD-special sections, we selected candidate genes for WGCNA (adj. P. Val < 0.05). To analyze the consensus section, we used the tutorial on the WGCNA website.
To thoroughly explore correlated gene modules, the WGCNA package in R was used to perform WGCNA processing on the candidate genes. First, we clustered the samples in the expression matrix (distinct from the candidate gene clustering described later); the purpose was to assess whether there were obvious outliers in the sample. Datasets containing the corresponding clinical characteristics, and the samples in the datasets were mapped to the clinical characteristics one-by-one. Subsequently, cyberspace construction and module detection were performed. Similar genes were divided into standard modules to obtain a hierarchical clustering dendrogram for module identification. Modules with clinical significance were selected and discussed. Gene significance (GS) and module membership (MM) were two key indicators used to identify modules closely related to clinical features. Finally, we selected modules that were highly related to specific clinical features for further analysis.
Screening for hub genes. To improve the accuracy of the process of identifying pivotal genes, a method of taking the intersection of the candidate genes after WGCNA analysis and the diseases differential genes were used.
Pathway enrichment analysis. To perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses of genes in significant modules and plot the relevant graphs, we used the "Cluster-  www.nature.com/scientificreports/ profiler" program in R software. Both Go and KEGG data are publicly available databases [12][13][14][15][16] . We also chose adj. P. Val < 0.05 as a screening condition in the enrichment analysis.

Immune cell infiltration analysis.
To determine the abundance of 24 types of immune cells in COPD and LUAD, the GSE76925 and GSE10072 gene expression data were uploaded to the Immune Cell Abundance Identifier online analysis website 17 . Box plots were used to graphically represent the difference in immune cell infiltration distribution between the disease group and the normal group. Related heatmaps were produced to visualize the relationship between 24 immune cells and key genes. We used the R packages "ggplot2" and "cowplot" to draw plots. The samples with a P value < 0.05 were screened to select meaningful immune cells.
Identification of candidate drug-targets. To identify the candidate drug targets in COPD and LUAD, the important modules genes from WGCNA analysis were uploaded to the Drug-Gene Interaction Database (DGIDB) 18 . DGIDB was be able to automatically analyze all gene-related drug targets. Finally, the results were plotted using R to map the drug target network plots. The drug with the most nodes was selected as a potential drug target. www.nature.com/scientificreports/ MAPK, and PI3K-Akt signaling pathway. GO enrichment analysis showed that genes were mainly in the cytosol and plasma membrane. We conducted KEGG pathway enrichment analysis of the COPD blue module, which identified ubiquitin-mediated proteolysis, ribosome, ubiquitin-mediated proteolysis, and the T/B-cell receptor signaling pathway. GO enrichment results for the blue module included protein binding and cytosol. The KEGG pathways for the consensus turquoise module were oxidative phosphorylation, proteasome, spliceosome, and    Figure S5. Second, we also made connections between immune infiltrating cells and key genes. NKT cells were strongly positively linked to hub genes, including GPI and ENO1, while Th17 cells were negatively correlated with EZH2 and SELL genes in COPD. The results also revealed that monocytes had a negative correlation with the CFB gene. In addition, the Th1-cell level was most strongly associated with MAD2L1 and GPI in LUAD; in contrast, the CENPF gene was most negatively correlated with macrophages and NK cells (Supplementary Figure S6).

Identification of candidate drug-targets.
The drug most frequently found in the LUAD blue module and turquoise module was cisplatin; The prominent drug in the COPD blue module was Tretinoin. In the COPD-LUAD turquoise module, bortezomib and metformin were identified as the main drug-targets. The results obtained from DGIDB were presented in Fig. 5.
Validation by real-time quantitative reverse transcription PCR. RT-qPCR was used to confirm the differential expression of hub genes. Overall, when COPD and LUAD samples were compared to those of normal controls, a total of 10 of the 15 genes analyzed were significantly differentially expressed. The findings were in accordance with the array analysis. COPD-specific genes (CORIN, SELL, SH3PXD2B) and LUAD-specific critical genes (ASPM, CENPF, MAD2L1, TOP2A, CDKN2A, ELF3, IGFBP2) were expressed at higher levels in comparison to normal controls. TRAF3IP3, BHLHE22, MUC5B, CEACAM5, and BUB1B, on the other hand, exhibited no differences in gene expression. The results are shown in Fig. 6.

Discussion
Because of the threat, they pose to human survival and the toll they take on society, it is extremely important to explore the biological mechanisms of COPD and LUAD, to improve the treatment and prevention of these conditions. COPD and LUAD are both common respiratory diseases, with differences and similarities. Previous  www.nature.com/scientificreports/ the findings with RT-qPCR experimental verification. The functions, molecular mechanisms and pathways of key genes were then analyzed in combination with GO and KEGG to provide insight into these genes. We also found significant differences in immune cell infiltration between the disease and normal groups. The roles of key genes involved in many immune responses and immune cell chemotaxis were also found. Meanwhile, we obtained drug targets from the DGIDB database for four modules related to disease pathogenesis. This study bridges the gap between Omics analyses and clinical applications by examining the unique and jointly biological mechanisms of COPD and LUAD pathogenesis in multiple datasets and from multiple perspectives. It may be provided new directions for the development of early interventions and treatments to improve COPD and LUAD prognosis. We obtained four hub genes, MAD2L1, CENPF, TOP2A, and ASPM, in the LUAD blue module. Mitotic arrest deficiency protein 2 (MAD2L1) and abnormal spindle microtubule assembly (ASPM) have been identified as vital mediators of the chromosomal control pathways. Type IIA topoisomerase (TOP2A) regulates the specific www.nature.com/scientificreports/ spatial structure (topological structure) by relaxing the positive and negative DNA supercoil structures during DNA replication and transcription and solves the problem of chromosome aggregation and mutual separation of chromatids. The gene located on chromosome 1q41 is centromere protein F, encoding Centromere protein F (CENPF), which is part of the centromere kinetochore complex 22 . The occurrence of most cancers is related to unstable factors in cell formation. Dysregulation of cell cycle pathways, including spindle assembly, resulting in unstable chromosomal structures or massive aneuploidy chromosomal aberrations, leads to tumorigenesis 23 . MAD2L1 participates in motorcycle control of the mitotic spindle assembly checkpoint. Overexpression of MAD2L1 can lead to lung carcinoma susceptibility 24,25 . It was further confirmed in the experiment that the expression of MAD2L1 in LUAD tissue was higher than the average amount. These findings also suggest that increased genetic diversity contributes to altered tumor survival and chemoresistance and that cell cycle pathways, including disruption of spindle assembly, have been the focus of recent chemotherapeutic drug development, such as paclitaxel and colchicine bases. Through correlation analysis between WGCNA and clinical indicators, we found that the turquoise modules had the strongest negative correlation with disease. The LUAD turquoise module screened for five central genes (IGFBP2, CDKN2A, MUC5B, CEACAM5, ELF3), and these genes in LUAD may be closely related to tumorigenesis. Cyclin-dependent kinase inhibitor 2A (CDKN2A) is an essential cell cycle regulating factor, and a study found that the absence of CDKN2A promoted the progression of lung cancer and that it was correlated with poor survival 26 .
We found that 3 genes were overexpressed in the COPD group (SH3PXD2B, CORIN, SELL). The L-selectin gene (SELL) is also known as CD62 L, which is a type-I transmembrane glycoprotein and cell adhesion molecule. CORIN is a member of the trypsin superfamily of type II transmembrane serine proteases. To date, there is no direct evidence that CORIN and SELL are implicated in the development and progression of COPD. To the best of our knowledge, this is the first time that the association of CORIN and SELL with COPD has been reported. Nevertheless, the effects of CORIN and SELL on inflammation and immunity have been demonstrated 27,28 . SH3 and PX domains 2B (SH3PXD2B) is involved in encoding the cohesive protein of the same name, which triggers the extracellular matrix (ECM) to produce elastase. Furthermore, elastase leads to the degradation of pulmonary elastin, which leads to the occurrence of emphysema, further affecting the formation and progression of COPD 29 .
The five hub genes in the COPD-LUAD consensus consist of EZH2, EFNA4, CFB, ENO1 and GPI. Interestingly, ENO1 and GPI are the first discoveries of new genes involved in the combined pathogenesis of COPD and LUAD. Enolase 1 (ENO1) glycolytic enzyme catalyzes the transformation of 2-phosphoglycerate to phosphoenolpyruvate to preserve aerobic glycolysis 30 . The present study determined that ENO1 is overexpressed in LUAD, consistent with published studies. ENO1 is involved in proliferative invasion, tumor metastasis and progression in LUAD through glycolysis and the PI3K/Akt pathway 31 . Patients with NSCLC with high ENO1 expression had relatively low disease-free survival and overall survival and were positively correlated with TNM stage 32 . The developmental mechanism of ENO1 and COPD has not been reported. However, it has been reported previously that because COPD is a long-term chronic inflammatory response, the function of neutrophils has a certain degree of influence on its pathogenesis. Neutrophils need to generate intracellular glycogen reserves through the glycolytic pathway to maintain their own cellular functions 33 . The impaired glycolytic pathway involving ENO1 has an impact on neutrophil function. Consequently, these findings support the idea that hat overexpression of these genes plays a significant role in COPD, LUAD, and both, and that they might be therapeutic targets.
The infiltration of immune cells has a crucial function in the progression of illnesses. Finding precise diagnostic markers and assessing the immune cell infiltration pattern in disease has far-reaching implications for improving their prognosis. The lungs of chronic obstructive pulmonary disease patients are prone to inflammation, which is linked to aberrant immunological responses. Immune cell infiltration in COPD and LUAD was considerably different from that in normal controls, according to our findings. The Tfh, NKT and B-cell expression abundance in COPD was considerably higher than that in the control group. Tfh cells are a type of CD4 + T-cell that promotes B-cell survival, affinity maturation, and recombination. However, an overactive Tfh cell response can result in a variety of autoimmune disorders, including rheumatoid arthritis 34 . Tfh cells have been observed in the early stages of COPD (GOLDI/II), which is compatible with the findings of this investigation. B cells and NKT cells are both lymphocytes, which are another type of immune cell 35 . Previous research reported an increase in NKT cells in COPD patients' bronchoalveolar lavage and generated sputum, as well as cytotoxicity against autologous lung epithelial cells 36 . Increased abundance of Tfh, NKT and B cells in patients with chronic obstructive pulmonary disease may help explain the relationship between lung inflammation and immune response in COPD patients, potentially paving the way for disease-targeted immunotherapy.
The richness of CD4 + T cells, Tr1, and Th1 were significantly higher in LUAD samples when compared to that in the control group. CD4 + T cells are essential in host defense, immunological modulation, and autoimmune disease. CD4 + T cells have been found to steadily grow during the shift from normal lung tissue to LUAD 37 . Th1 cells are helper T cells that secrete cytokines that regulate cell development, differentiation, inflammation, and immunological responses 38,39 . Th1 expression was found to be higher in LUAD, which contradicts previous research that found Th1 cytokine levels to be lower in lung cancer patients, Th1 cells to play an antitumor role, and Th2 cells to promote tumor growth 40,41 . The fundamental reason for this is that the research topics are diverse. This research focuses on LUAD, a lung cancer subtype. Second, tumor occurrence and recurrence are complicated processes that may be influenced by patient-specific alterations in Th1 and Th2 cytokines. The exact regulatory mechanism is still a mystery.
In terms of potential drug targets, we identified cisplatin and tretinoin, as well as bortezomib and metformin. Cisplatin chemotherapy is the basis for the treatment of LUAD patients today. Cisplatin chemotherapy function and mechanism of action are related to its cross-linking with guanine and adenine on DNA, obstructing the DNA self-repair mechanism, causing permanent sabotage to DNA, and subsequently inducing carcinoma cell apoptosis 42 . Although there is no reported use of tretinoin for improved prognosis of COPD, a basic study found www.nature.com/scientificreports/ that Tretinoin significantly treatment abrogated elastase-induced pulmonary emphysema in rats 43 . Bortezomib was consistently identified as an important drug target in the analysis of the COPD-LUAD module. Inhibition of the proteasome leads to disruption of the dynamic balance of proteins, which adversely affects the cellular signaling cascade 44 . Bortezomib and carfilzomib are both proteasome inhibitors and are among the major backbone drugs in oncology therapy. Although bortezomib is primarily used in the treatment regimen for multiple myeloma, it has been reported to have potential benefits in the treatment of lung carcinoma 45 . Recent studies suggest that recognition of the proteasome may be a potential therapeutic target for restoring respiratory muscle function in patients with COPD 46,47 . Metformin may reduce the accumulation of advanced glycation end products by activating amp-related protein kinase (AMPK), thereby reducing airway inflammation, increasing lung capacity, and may improving the prognosis of COPD 48,49 . Furthermore, a previous study reported that use of metformin was inversely associated with pulmonary cancer risk 50 . Consequently, cisplatin and tretinoin, as well as bortezomib and metformin may be potential targeted therapy for patients with COPD combined LUAD. However, there are currently difficulties in repurposing of drug targets. With the exception of cisplatin, which is more likely to cause drug resistance, all of the drug targets we evaluated are currently often used to treat other illnesses. With the gradual development of Druggable Genome, a technique that directly detects genomic sequences to establish the relationship between gene sequence changes and pharmacological effects, we can anticipate drug reuse in the future. It's interesting to note that the technology is currently employed in clinical settings, including genetic testing for cardiovascular medications 51 . Our research has some limitations. First, the study's samples are limited. More samples and prospective investigations are needed in the next study to fully examine and validate our findings. Second, we lack datasets or lung tissues about COPD combined with LUAD. We will employ clinically matched COPD combined with LUAD samples in the future to confirm the protein expression level of hub genes using western blot analysis. Third, the drug targets identified in this study are based only on a predictive analysis of the disease's major modules and have yet to be empirically validated. In terms of theoretical mechanisms, the medications we acquired may be suitable for improving the prognosis of both disorders. More clinical investigations are needed in the future to establish the validity and reliability of the findings of this study. Moreover, the datasets in this study involved information on demographic characteristics, and differences between disease and control groups regarding demographic characteristics may lead to potential bias in the results of our analysis.
In summary, our research found important genes linked to COPD and LUAD, both individually and jointly. CORIN and SELL, as well as EFNA4 and CFB, were identified for the first time to play a role in the etiology of COPD and LUAD. We discovered the high expression of immune cells in the immunological microenvironment of COPD and LUAD patients. We also found that cisplatin and tretinoin, as well as bortezomib and metformin may be potential targeted therapy for patients with COPD combined LUAD. In fact, we clearly explored the unique and jointly molecular mechanisms of the pathogenesis of COPD and LUAD in multiple datasets and from multiple viewpoints, providing a new direction for developing early interventions and treatments to improve the prognosis of COPD and LUAD.

Data availability
COPD and LUAD datasets from publicly available database: Gene Expression Omnibus. The accession number of both datasets are GSE 10072 and GSE 76925. All the procedures were performed in accordance with the relevant guidelines and regulations. Further inquiries can be directed to the corresponding author.